indefinite kernel
$L_1$-norm Regularized Indefinite Kernel Logistic Regression
Kernel methods represent a fundamental class of machine learning techniques and have gained widespread adoption across diverse domains [32], including computer vision [22, 13], natural language processing (NLP) [36, 4], and bioinformatics [29], among others. The core idea underlying kernel methods is to employ a kernel function that implicitly maps the input data into a high-dimensional feature space, thereby enabling the use of linear models to solve nonlinear learning tasks in the original space. Consequently, the selection of an appropriate kernel function is critical to the performance of the method. Traditional kernel methods predominantly rely on positive definite (PD) kernels, such as the polynomial kernel and the Gaussian kernel. According to Mercer's Theorem, a PD kernel ensures that the resulting kernel matrix is positive semidefinite (PSD), thereby facilitating the analysis of the learning problem within the framework of reproducing kernel Hilbert spaces (RKHS) [9]. The PSD property guarantees that the corresponding optimization problem is convex and thus tractable. These authors contributed equally to this work.
Reviews: Wasserstein Weisfeiler-Lehman Graph Kernels
The main motivation of this work is based on the fact that conventional graph kernels loose information in their embedding and/or aggregation steps. While we agree with the authors on this point, it is not clear what is the information lost with the proposed WWL graph kernel. Since the proposed method is based on the WL subtree kernel, then it has the same weaknesses as it. Moreover, it may have more issues, such as the non-uniqueness of the embedding, the iterative operations related to hashingโฆ The part "To ensure the theoretical correctness of our resultsโฆ" is confusing and misleading. On a first reading, the reader may understand that the theoretical results are not correct.
Support Vector Machine Classification with Indefinite Kernels
In this paper, we propose a method for support vector machine classification using indefinite kernels. Instead of directly minimizing or stabilizing a nonconvex loss function, our method simultaneously finds the support vectors and a proxy kernel matrix used in computing the loss. This can be interpreted as a robust classification problem where the indefinite kernel matrix is treated as a noisy observation of the true positive semidefinite kernel. Our formulation keeps the problem convex and relatively large problems can be solved efficiently using the analytic center cutting plane method. We compare the performance of our technique with other methods on several data sets.
Analysis of SVM with Indefinite Kernels
The recent introduction of indefinite SVM by Luss and dAspremont [15] has effectively demonstrated SVM classification with a non-positive semi-definite kernel (indefinite kernel). This paper studies the properties of the objective function introduced there. In particular, we show that the objective function is continuously differentiable and its gradient can be explicitly computed. Indeed, we further show that its gradient is Lipschitz continuous. The main idea behind our analysis is that the objective function is smoothed by the penalty term, in its saddle (min-max) representation, measuring the distance between the indefinite kernel matrix and the proxy positive semi-definite one.
Coefficient-based Regularized Distribution Regression
Mao, Yuan, Shi, Lei, Guo, Zheng-Chu
In this paper, we consider the coefficient-based regularized distribution regression which aims to regress from probability measures to real-valued responses over a reproducing kernel Hilbert space (RKHS), where the regularization is put on the coefficients and kernels are assumed to be indefinite. The algorithm involves two stages of sampling, the first stage sample consists of distributions and the second stage sample is obtained from these distributions. Asymptotic behaviors of the algorithm in different regularity ranges of the regression function are comprehensively studied and learning rates are derived via integral operator techniques. We get the optimal rates under some mild conditions, which matches the one-stage sampled minimax optimal rate. Compared with the kernel methods for distribution regression in the literature, the algorithm under consideration does not require the kernel to be symmetric and positive semi-definite and hence provides a simple paradigm for designing indefinite kernel methods, which enriches the theme of the distribution regression. To the best of our knowledge, this is the first result for distribution regression with indefinite kernels, and our algorithm can improve the saturation effect.
Fast Learning in Reproducing Kernel Krein Spaces via Generalized Measures
Liu, Fanghui, Huang, Xiaolin, Chen, Yingyi, Suykens, Johan A. K.
In this paper, we attempt to solve a long-lasting open question in non-positive definite (non-PD) kernels: does a given non-PD kernel can be decomposed into the difference of two PD kernels (termed as positive decomposition)? We consider this question in a distribution view by introducing the \emph{signed measure}, which transforms positive decomposition to measure decomposition: a series of non-PD kernels can be associated with the linear combination of specific finite Borel measures. In this manner, our distribution-based framework provides a sufficient and necessary condition to answer this open question. And this answer is also computationally implementable in practice to scale non-PD kernels in large sample cases, which allows us to devise the first random features algorithm to obtain an unbiased estimator. Interestingly, our framework shows that the popular neural tangent kernel (NTK) of a two-layer ReLU network on the unit sphere is non-PD, which expands the usage of classical non-PD kernels. Experimental results on several benchmark datasets verify the effectiveness of our algorithm over the existing methods.
Analysis of Least Squares Regularized Regression in Reproducing Kernel Krein Spaces
Liu, Fanghui, Shi, Lei, Huang, Xiaolin, Yang, Jie, Suykens, Johan A. K.
In this paper, we study the asymptotical properties of least squares regularized regression with indefinite kernels in reproducing kernel Kre\u{\i}n spaces (RKKS). The classical approximation analysis cannot be directly applied to study its asymptotical behavior under the framework of learning theory as this problem is in essence non-convex and outputs stationary points. By introducing a bounded hyper-sphere constraint to such non-convex regularized risk minimization problem, we theoretically demonstrate that this problem has a globally optimal solution with a closed form on the sphere, which makes our approximation analysis feasible in RKKS. Accordingly, we modify traditional error decomposition techniques, prove convergence results for the introduced hypothesis error based on matrix perturbation theory, and derive learning rates of such regularized regression problem in RKKS. Under some conditions, the derived learning rates in RKKS are the same as that in reproducing kernel Hilbert spaces (RKHS), which is actually the first work on approximation analysis of regularized learning algorithms in RKKS.
Graph Neural Networks with Composite Kernels
Zhou, Yufan, Xian, Jiayi, Chen, Changyou, Xu, Jinhui
Learning on graph structured data has drawn increasing interest in recent years. Frameworks like Graph Convolutional Networks (GCNs) have demonstrated their ability to capture structural information and obtain good performance in various tasks. In these frameworks, node aggregation schemes are typically used to capture structural information: a node's feature vector is recursively computed by aggregating features of its neighboring nodes. However, most of aggregation schemes treat all connections in a graph equally, ignoring node feature similarities. In this paper, we re-interpret node aggregation from the perspective of kernel weighting, and present a framework to consider feature similarity in an aggregation scheme. Specifically, we show that normalized adjacency matrix is equivalent to a neighbor-based kernel matrix in a Krein Space. We then propose feature aggregation as the composition of the original neighbor-based kernel and a learnable kernel to encode feature similarities in a feature space. We further show how the proposed method can be extended to Graph Attention Network (GAT). Experimental results demonstrate better performance of our proposed framework in several real-world applications.
Support Vector Machine Classification with Indefinite Kernels
Luss, Ronny, D', aspremont, Alexandre
In this paper, we propose a method for support vector machine classification using indefinite kernels. Instead of directly minimizing or stabilizing a nonconvex loss function, our method simultaneously finds the support vectors and a proxy kernel matrix used in computing the loss. This can be interpreted as a robust classification problem where the indefinite kernel matrix is treated as a noisy observation of the true positive semidefinite kernel. Our formulation keeps the problem convex and relatively large problems can be solved efficiently using the analytic center cutting plane method. We compare the performance of our technique with other methods on several data sets.
Analysis of SVM with Indefinite Kernels
Ying, Yiming, Campbell, Colin, Girolami, Mark
The recent introduction of indefinite SVM by Luss and dAspremont [15] has effectively demonstrated SVM classification with a non-positive semi-definite kernel (indefinite kernel). This paper studies the properties of the objective function introduced there. In particular, we show that the objective function is continuously differentiable and its gradient can be explicitly computed. Indeed, we further show that its gradient is Lipschitz continuous. The main idea behind our analysis is that the objective function is smoothed by the penalty term, in its saddle (min-max) representation, measuring the distance between the indefinite kernel matrix and the proxy positive semi-definite one.